ABSTRACT

This statement defines how NetWare features like HotFix and
read-after-write verification behave in general, the extent
of a VADD's role in these features, and how the LANStor ESDI
and SCSI VADDs, in particular, implement these features.

Included are some comments on DGroup memory-space usage and
the Adaptec 2322b ESDI controller.

The word "verification" will be used instead of the longer
phrase "read-after-write verification".



HOTFIX

If a VADD runs into a problem, it notifies NetWare and
describes the problem as either a controller error or a
media error.

For a controller error, NetWare will retry the operation
several times before shutting down the drive and any other
drives that are participating on the same channel.  HotFix
is not invoked because NetWare assumes that if the
controller is failing, reads and writes performed by HotFix
would fail as well.

The HotFix feature is invoked when a VADD reports a media
error;  the controller is working, but for some reason the
data could not be stored or retrieved.  Write operations are
handled simply; a new block (8 512-byte sectors) from the
redirection pool replaces the bad block.  To HotFix a failed
read operation, NetWare recovers as much of the 8-sector
block as possible by doing single-sector reads.  Any missing
data is taken from the mirror drive, if it is available.

Usually media errors are detected by the drive's controller.
The VADD consults the controller and, in turn, notifies
NetWare.


READ-AFTER-WRITE VERIFICATION

Theory...

The verification feature supplements the fault-tolerent
nature of NetWare.  Each time data is written, it is read
back and verified for accuracy.  If the verification fails,
NetWare is notified and the HotFix feature is invoked
(because the VADD reports the failure as a media error).

The intent of this feature is to guarantee that data can be
moved reliably from server memory to the disk-system and
recorded properly.  Two aspects of the disk-system are
checked by this feature; media reliability and
communications (cable) integrity.

Sometimes an area of a disk's medium can be written
successfully, yet fail when read at a later time.  By
reading and comparing the data after writing, "soft" errors
of this type can be reduced.  Some disk-system hardware can
perform this function by themselves, independent of a VADD,
O/S or anything else.

The cables used in a disk-system are no less important than
the disk drives themselves.  To check their reliability,
data must make a complete circuit from server memory to the
disk-system and back again.  It is crucial that data be
brought back into server memory so it can be compared to the
original.  It is for this reason that disk-system hardware
that perform verification themselves only do half the job
unless the data is somehow brought back into server memory.


Reality...

The higher-quality disk-systems can be counted upon to
either record data properly or provide notification if they
cannot.  Any "soft" errors are usually defeated with an
arsenal of ECC-recovery, retries with skew adjustments and
relocation of bad sectors; all handled automatically by the
disk-system hardware.

Communications problems with the cables are usually dealt
with when a new server is installed.  It either works or it
doesn't.  Maybe the cable and controller are too close to
some noisy circuit board or there is a short in the cable or
something like that.  But for the most part, once good
communication is established with the disk-system, it will
remain that way unless the server is moved or outfitted with
different boards.

So, depending on your comfort-level, the verification
feature is either burdensome overhead or a degree of extra
insurance in case of intermittent noise problems.


LANStor READ-AFTER-WRITE VERIFICATION

With the introduction of NetWare 2.1x, the O/S no longer
performed the verification automatically.  It became the
responsibility of the VADD to do the verification.  Novell
directed all VADD implementors to include this feature, and
no exceptions were permitted.  The Novell literature does
state that the O/S performs the verification.  Such a
statement is not inaccurate if the writer viewed VADDs and
LAN drivers as part of the O/S.

The first LANStor VADDs always did verification, which
involves an extra read following a write.  We later made it
an option because other vendors had excluded this feature
from the very start and we simply didn't perform as well
when compared to them in benchmarks.

If you choose to use the verification feature, LANStor will,
during its initialization, reserve one 4096-byte buffer for
each NetWare channel upon which the VADD is loaded.


LANStor AND DGROUP USAGE

The DGroup data area in NetWare is a 64k chunk of memory
somewhere in the file server.  The O/S, LAN drivers, VADDs
and VAPs and the machine stack all share this region.  All
third-party implementors were advised by Novell to curtail
their use of the DGroup area, for DGroup is practically all
used up even before one starts linking in third-party
software.  It is for this reason that all LANStor VADDs use
only about 10 bytes of the DGroup area.

LANStor does require memory, but it doesn't take it from
DGroup.  Instead, LANStor dynamically asks NetWare for the
memory that it needs.  The memory is in the form of segments
that are privately owned by LANStor; there are no conflicts
with any other process.

A concern that we had early on was the machine stack.  It is
located in DGroup and we were told it is pretty small to
begin with.  It gets reduced in size as third-party software
is linked with NetWare.  All VADDs that we have seen run
with interrupts disabled.  These VADDs can easily control
their stack usage, reducing the chance for a stack overflow.

LANStor, on the other hand, runs with interrupts enabled.
We do this so that LAN boards, printers, and other VADDs can
do their thing while we do ours.  Overall system throughput
is increased, along with the risk of stack overflows, since
nested interrupts can occur.  LANStor avoids this by
switching to its own local (and larger) stack whenever it
gets control.


ADAPTEC 2322B ESDI CONTROLLER

This controller lacks the ability to perform the read-after-
write verification function.  It does have a read-ahead
feature, which is a different thing altogether.  The read-
ahead function is simply used to optimize potential
sequential reads.

There appears to be a problem with the read-ahead feature,
though this is not verified.  This is why Storage Dimensions
suggests that read-ahead be disabled when using this
controller.

 d